{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c0e56fcb",
   "metadata": {},
   "source": [
    "# Multi-LoRA inference with NVIDIA NIM\n",
    "\n",
    "This is a demonstration of deploying multiple LoRA adapters with NVIDIA NIM. NIM supports LoRA adapters in .nemo (from NeMo Framework), and Hugging Face model formats. \n",
    "\n",
    "We will deploy the PubMedQA LoRA adapter from previous notebook, alongside two other previously trained LoRA adapters (GSM8K, SQuAD) that are available on NVIDIA NGC as examples.\n",
    "\n",
    "`NOTE`: While it's not necessary to complete the LoRA training and obtain the adapter from the previous notebook (\"Creating a LoRA adapter with NeMo Framework\") to follow along with this one, it is recommended if possible. You can still learn about LoRA deployment with NIM using the other adapters downloaded from NGC."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d95c164c-b7f2-41d8-8ce3-67656f7bee83",
   "metadata": {
    "tags": []
   },
   "source": [
    "This notebook includes instructions to send an inference call to NVIDIA NIM using the Python `requests` library."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5fbf9e2-220b-4677-8a5c-68bba94858c8",
   "metadata": {},
   "source": [
    "## Before you begin\n",
    "Ensure that you satisfy the pre-requisites, and have completed the setup instructions provided in the README associated with this tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "144d8f05-9dad-425a-9ee8-7b54d7554569",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c83ea9c9-3ef4-4911-8bd3-cb9457dba5d6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "import json"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f09747b0",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Check available LoRA models\n",
    "\n",
    "Once the NIM server is up and running, check the available models as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4489179d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "url = 'http://0.0.0.0:8000/v1/models'\n",
    "\n",
    "response = requests.get(url)\n",
    "data = response.json()\n",
    "\n",
    "print(json.dumps(data, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db8f40b4-7b43-4781-bf95-bf566a843422",
   "metadata": {},
   "source": [
    "This will return all the models available for inference by NIM. In this case, it will return the base model `meta/llama3-8b-instruct`, as well as the LoRA adapters that were provided during NIM deployment - `llama3-8b-pubmed-qa` (if applicable), `llama3-8b-instruct-lora_vnemo-math-v1`, and `llama3-8b-instruct-lora_vnemo-squad-v1`. Note that their names match the folder names where their .nemo files are stored."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "151e8efd",
   "metadata": {},
   "source": [
    "---\n",
    "## Multi-LoRA inference\n",
    "\n",
    "Inference can be performed by sending POST requests to the `/completions` endpoint.\n",
    "\n",
    "A few things to note:\n",
    "* The `model` parameter in the payload specifies the model that the request will be directed to. This can be the base model `meta/llama3-8b-instruct`, or any of the LoRA models, such as `llama3-8b-pubmed-qa`.\n",
    "* `max_tokens` parameter specifies the maximum number of tokens to generate. At any point, the cumulative number of input prompt tokens and specified number of output tokens to generate should not exceed the model's maximum context limit. For llama3-8b-instruct, the context length supported is 8192 tokens.\n",
    "\n",
    "Following code snippets show how it's possible to send requests belonging to different LoRAs (or tasks). NIM dynamically loads the LoRA adapters and serves the requests. It also internally handles the batching of requests belonging to different LoRAs to allow better performance and more efficient of compute."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49789d64-c07c-43ed-8ace-0167d6daf415",
   "metadata": {},
   "source": [
    "### PubMedQA\n",
    "\n",
    "If you have trained the PubMedQA LoRA model and made it available via NIM inference, try sending an example from the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dfd2083",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "url = 'http://0.0.0.0:8000/v1/completions'\n",
    "headers = {\n",
    "    'accept': 'application/json',\n",
    "    'Content-Type': 'application/json'\n",
    "}\n",
    "\n",
    "# Example from the PubMedQA test set\n",
    "prompt=\"BACKGROUND: Sublingual varices have earlier been related to ageing, smoking and cardiovascular disease. The aim of this study was to investigate whether sublingual varices are related to presence of hypertension.\\nMETHODS: In an observational clinical study among 431 dental patients tongue status and blood pressure were documented. Digital photographs of the lateral borders of the tongue for grading of sublingual varices were taken, and blood pressure was measured. Those patients without previous diagnosis of hypertension and with a noted blood pressure \\u2265 140 mmHg and/or \\u2265 90 mmHg at the dental clinic performed complementary home blood pressure during one week. Those with an average home blood pressure \\u2265 135 mmHg and/or \\u2265 85 mmHg were referred to the primary health care centre, where three office blood pressure measurements were taken with one week intervals. Two independent blinded observers studied the photographs of the tongues. Each photograph was graded as none/few (grade 0) or medium/severe (grade 1) presence of sublingual varices. Pearson's Chi-square test, Student's t-test, and multiple regression analysis were applied. Power calculation stipulated a study population of 323 patients.\\nRESULTS: An association between sublingual varices and hypertension was found (OR = 2.25, p<0.002). Mean systolic blood pressure was 123 and 132 mmHg in patients with grade 0 and grade 1 sublingual varices, respectively (p<0.0001, CI 95 %). Mean diastolic blood pressure was 80 and 83 mmHg in patients with grade 0 and grade 1 sublingual varices, respectively (p<0.005, CI 95 %). Sublingual varices indicate hypertension with a positive predictive value of 0.5 and a negative predictive value of 0.80.\\nQUESTION: Is there a connection between sublingual varices and hypertension?\\n ### ANSWER (yes|no|maybe): \"\n",
    "\n",
    "data = {\n",
    "    \"model\": \"llama3-8b-pubmed-qa\",\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 128\n",
    "}\n",
    "\n",
    "response = requests.post(url, headers=headers, json=data)\n",
    "response_data = response.json()\n",
    "\n",
    "print(json.dumps(response_data, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8292214a-2b53-41dd-97c7-1ed93877bf01",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1877e910-ed46-417a-8b0f-89f13d9bdafb",
   "metadata": {},
   "source": [
    "### Grade School Math (GSM8K dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "256d3771-b6a6-4d0d-89ef-680dbb34e515",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "url = 'http://0.0.0.0:8000/v1/completions'\n",
    "headers = {\n",
    "    'accept': 'application/json',\n",
    "    'Content-Type': 'application/json'\n",
    "}\n",
    "\n",
    "prompt = '''Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? Answer:'''\n",
    "\n",
    "data = {\n",
    "    \"model\": \"llama3-8b-instruct-lora_vnemo-math-v1\",\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 128\n",
    "}\n",
    "\n",
    "response = requests.post(url, headers=headers, json=data)\n",
    "response_data = response.json()\n",
    "\n",
    "print(json.dumps(response_data, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f56d091-ce70-44ea-a705-e350eb4d6e31",
   "metadata": {},
   "source": [
    "### Extractive Question-Answering (SQuAD)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f50aa6e-0b9a-4834-b7d6-51a48f16eea6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "url = 'http://0.0.0.0:8000/v1/completions'\n",
    "headers = {\n",
    "    'accept': 'application/json',\n",
    "    'Content-Type': 'application/json'\n",
    "}\n",
    "\n",
    "prompt = '''CONTEXT: \"The Norman dynasty had a major political, cultural and military impact on medieval Europe and even the Near East. The Normans were famed for their martial spirit and eventually for their Christian piety, becoming exponents of the Catholic orthodoxy into which they assimilated. They adopted the Gallo-Romance language of the Frankish land they settled, their dialect becoming known as Norman, Normaund or Norman French, an important literary language. The Duchy of Normandy, which they formed by treaty with the French crown, was a great fief of medieval France, and under Richard I of Normandy was forged into a cohesive and formidable principality in feudal tenure. The Normans are noted both for their culture, such as their unique Romanesque architecture and musical traditions, and for their significant military accomplishments and innovations. Norman adventurers founded the Kingdom of Sicily under Roger II after conquering southern Italy on the Saracens and Byzantines, and an expedition on behalf of their duke, William the Conqueror, led to the Norman conquest of England at the Battle of Hastings in 1066. Norman cultural and military influence spread from these new European centres to the Crusader states of the Near East, where their prince Bohemond I founded the Principality of Antioch in the Levant, to Scotland and Wales in Great Britain, to Ireland, and to the coasts of north Africa and the Canary Islands.\\nQUESTION: What were the Norman dynasty famous for? ANSWER:'''\n",
    "data = {\n",
    "    \"model\": \"llama3-8b-instruct-lora_vnemo-squad-v1\",\n",
    "    \"prompt\": prompt,\n",
    "    \"max_tokens\": 128\n",
    "}\n",
    "\n",
    "response = requests.post(url, headers=headers, json=data)\n",
    "response_data = response.json()\n",
    "\n",
    "print(json.dumps(response_data, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b65afd7a",
   "metadata": {},
   "source": [
    "---\n",
    "## (Optional) Testing the accuracy of NIM inference\n",
    "\n",
    "If you followed the previous notebook on training a Llama-3-8b-Instruct LoRA adapter using NeMo Framework and evaluated the model accuracy, you can test the same using NIM inference for validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7516c8c7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Ensure that the path to PubMedQA test data is correct\n",
    "data_test = json.load(open(\"./pubmedqa/data/test_set.json\",'rt'))\n",
    "\n",
    "def read_jsonl (fname):\n",
    "    obj = []\n",
    "    with open(fname, 'rt') as f:\n",
    "        st = f.readline()\n",
    "        while st:\n",
    "            obj.append(json.loads(st))\n",
    "            st = f.readline()\n",
    "    return obj\n",
    "\n",
    "prepared_test = read_jsonl(\"./pubmedqa/data/pubmedqa_test.jsonl\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68511ac9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Send an inference request to the PubMedQA LoRA model\n",
    "def infer(prompt):\n",
    "\n",
    "    url = 'http://0.0.0.0:8000/v1/completions'\n",
    "    headers = {\n",
    "        'accept': 'application/json',\n",
    "        'Content-Type': 'application/json'\n",
    "    }\n",
    "\n",
    "    data = {\n",
    "        \"model\": \"llama3-8b-pubmed-qa\",\n",
    "        \"prompt\": prompt,\n",
    "        \"max_tokens\": 128\n",
    "    }\n",
    "\n",
    "    response = requests.post(url, headers=headers, json=data)\n",
    "    response_data = response.json()\n",
    "\n",
    "    return(response_data[\"choices\"][0][\"text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4f44cd6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from tqdm import tqdm\n",
    "\n",
    "results = {}\n",
    "sample_id = list(data_test.keys())\n",
    "\n",
    "for i, key in tqdm(enumerate(sample_id)):\n",
    "    answer = infer(prepared_test[i]['input'].strip())\n",
    "    answer = answer.lower()\n",
    "    if 'yes' in answer:\n",
    "        results[key] = 'yes'\n",
    "    elif 'no' in answer:\n",
    "        results[key] = 'no'\n",
    "    elif 'maybe' in answer:\n",
    "        results[key] = 'maybe'\n",
    "    else:\n",
    "        print(\"Malformed answer: \", answer)\n",
    "        results[key] = 'maybe'\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "319f49ba-0b57-486e-977b-06c89466af60",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "answer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9942a1d6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# dump results\n",
    "FILENAME=\"pubmedqa-llama-3-8b-lora-NIM.json\"\n",
    "with(open(FILENAME, \"w\")) as f:\n",
    "    json.dump(results, f)\n",
    "\n",
    "# Evaluation\n",
    "!cp $FILENAME ./pubmedqa/\n",
    "!cd ./pubmedqa/ && python evaluation.py $FILENAME"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d014d79",
   "metadata": {},
   "source": [
    "NIM inference should provide comparable accuracy to NeMo Framework inference.\n",
    "\n",
    "Note that each individual answer also conform to the format we specified, i.e. `<<< {answer} >>>`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
